Low Power Shift Register

ABSTRACT

A clock control circuit for a parallel in, serial out (PISO) shift register helps save power. The clock control circuit selectively clocks the shift register as it converts a parallel input to a serial output. For example, the clock control circuit may provide clock signals to the flip flops (or other buffers) in the shift register that will receive data elements provided with the parallel input. However, the clock control circuit withholds clock signals from flip flops that will not receive data elements provided with the parallel input, or that have already been received by a particular flip flop. As the parallel loaded input elements propagate serially through the shift register, on each clock cycle an additional memory no longer needs to be clocked. The memory no longer needs to be clocked because that memory has already propagated its loaded input element to the following memory, and no further element provided in the N element parallel loaded data is incoming.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.13/962,699, filed Aug. 8, 2013, which claims priority to provisionalapplication Ser. No. 61/859,425, filed Jul. 29, 2013, each of which areincorporated by reference in their entirety.

TECHNICAL FIELD

This disclosure relates to power consumption in digital logic. Thisdisclosure also relates to reducing power consumed by shift registers,such as parallel-in, serial-out shift registers.

BACKGROUND

Rapid advances in electronics and communication technologies, driven byimmense customer demand, have resulted in the widespread adoption ofelectronic devices of every description. These devices process digitaldata in many different ways, and often include shift registers to storeand propagate digital data. Reducing power consumption will extendbattery life, save money, and have other desirable effects.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example of circuit with a shift register and clockcontrol.

FIG. 2 shows an example of W element wide, N element deep parallel toserial shift register control.

FIG. 3 shows an example timing diagram.

FIG. 4 shows example logic for converting parallel loaded data elementsto a serial, with power saving clock control.

DETAILED DESCRIPTION

FIG. 1 shows an example of a circuit 100. The circuit 100 includes ashift register 102 connected to clock control logic 104. A parallel loadcircuit 106 is also present. The circuit 100 may be implemented in manydifferent ways, several examples of which are given next. Note first,however, that in FIG. 1 one shift register 102 is present, but W shiftregisters, each defining a serial channel of depth N, may be stacked inparallel to create a multiple element serialized output of width W. Oneinstance of a clock control logic 104 may be used to selectively clockas many of the serial channels as desired. The architecture is describedfurther below with respect to FIG. 2 in conjunction with power savingestimates.

As one example implementation, the shift register 102 includes seriallyconnected memories (e.g., 108, 110, 112, 114) configured to convert Nelements received on the parallel input 116 to serial elements on aserial output 118. In that respect, the memories may be flip-flops, suchas D flip-flops, or other types of memories, and the depth of the shiftregister 102 is N, corresponding to the number of elements loaded inparallel. The memories may include a clock input (e.g., 120, 122, 124,126, 128). A clock pulse applied to the clock input causes the datapresent at the input of the memory to propagate through to the output ofthe memory, and thus to the next memory serially in order.

The parallel load circuit 106 provides a parallel load enable input 128to the shift register 102. In the example shown in FIG. 1, the parallelload enable input 128 controls selection logic, such as the multiplexer130. The selection logic determines the input received by the memorieson the next clock pulse. For example, the input may be a particularparallel load element determined by the parallel load input 116.Alternatively, the input may be the output of the prior memory in theserially connected chain of memories, and a preset ‘0’ or ‘1’ input 142for the last memory in the chain (e.g., memory 108). The parallel loadenable input 128 follows the pre-parallel load enable signal 134 when aserial clock propagates the pre-parallel load enable signal 134 throughthe parallel load enable circuit 106 (e.g., a D flip-flop clocked withthe serial clock input 132). In that respect, the parallel load enablecircuit 106 delays the pre-parallel load enable signal 134 by one clock,while the pre-parallel load enable signal 134 sets the clock enable bitsin the clock control circuit 104 in preparation for the parallel loadand shifting operation.

The circuit 100 also includes clock control logic 104. The clock controllogic 104 includes gated clock signal outputs (e.g., 135, 136, 138), andan ungated clock signal 140. The gated clock signal outputs selectivelyprovide clock signal pulses to the shift register memories. Inparticular, the clock control logic 104 generates clock signals to thosememories that will next receive data elements that have not alreadypassed through the memory. In addition, the data elements are part ofthe parallel load input, meaning that the clock control logic 104 mayprevent preset data (e.g., present on the preset input 142) frompropagating through the shift register 102. In other implementations,the multiplexer 130 may be omitted, and the N-1 element of the parallelinput 116 may connect directly to the data input of the memory 108. Theclock control logic 104 also withholds clock signals from those memoriesthat would only be receiving data elements that have already passedthrough the memory, or that is preset data.

In other words, as the parallel loaded input elements propagate seriallythrough the shift register 102, on each clock cycle an additional memory(starting with the last memory 108) no longer needs to be clocked. Thememory no longer needs to be clocked because that memory has alreadypropagated its loaded input element to the following memory, and nofurther element provided in the N element loaded data is incoming.Accordingly, the clock control logic 104 withholds a clock pulse from anincreasing number of memories, as the N elements loaded in parallelpropagate through the shift register 102. The result is that fewer thanall of the memories are clocked, and an increasing number of memoriesare not clocked, as the loaded data propagates out the serial output118. In that regard, the clock control logic 104 avoids the power wastedto clock the preset input 142 or any other unneeded elements down theshift register 102.

In one implementation, the clock control logic 104 includes seriallyconnected memories (e.g., 144, 146, 148) configured to propagate clockenable bits on clock enable inputs (e.g., 150) as the parallel input isconverted to the serial output. The clock control logic 104 includesinitialization logic (e.g., 152) in communication with the seriallyconnected memories for selectively setting and selectively clearing theclock enable bits. In the example shown in FIG. 1, the initializationlogic 152 is a two input multiplexer that selects between thepre-parallel load enable signal 134 and a cleared (e.g., ‘0’ value)input 156. The two input multiplexer may be implemented in or replacedby combinatorial logic.

Accordingly, when the shift register 102 is parallel loaded one clockcycle after the pre-parallel load enable signal 134, the pre-parallelload enable signal 134 has already caused each memory in the clockcontrol logic 104 to be set with an active clock enable bit. As such,prior to the parallel load, the input to each instance of clock gatinglogic 154, abbreviated ‘CGL’ in FIG. 1, includes an asserted clockenable bit. The other input to the clock gating logic is the serialclock input 132. Thus, on the first clock cycle of the serial clockinput 132 of a new parallel to serial conversion, each memory in theshift register 102 receives a propagation signal in the form of a clockpulse. The clock gating logic 154 may be implemented in many differentways, including as a 2 input AND gate proceeded by an optional latch onthe clock enable signal, as just one example.

The parallel load enable signal 128 is asserted during a parallel load,but de-asserted for the shifting operation until the next parallel loadoperation occurs. Accordingly, each pulse on the serial clock input 132causes a clock enable bit to be cleared, first by clocking in thecleared input 156, then by propagating the cleared input down the chainof serially connected memories in the clock control logic 104. The clockenable bits are cleared starting with the first memory 144, andcontinuing, once per serial clock cycle, to propagate through theremaining serially connected memories in the clock control logic 104from memory 144 to memory 146 to memory 148. Accordingly, each clockcycle, one fewer memory in the shift register 102 receives a serialclock pulse, but since the memories not being clocked no longerpropagate any data needed by subsequent memories, the serial output 118still delivers a serialized version of the N elements loaded inparallel.

Note that the first memory 114 in the serial chain may receive everyclock pulse on the serial clock input 132. In effect, the clock signalfor the first memory 114 is ungated. The ungated clock may be appliedbecause the first memory 114 always has an element of data to pass tothe serial output 118, assuming that no more than N clocks are appliedto the N element deep shift register before it is reloaded. In otherimplementations, for example, the clock signal 140 for the first memory114 may be gated off after all N elements have been shifted out anduntil a new N element parallel load occurs.

Expressed another way, the circuit 100 includes a shift register 102with a serial output 118 and buffers (e.g., flip-flops) configured toserially propagate the elements (e.g., data bits) stored in the buffersto the serial output 118. A parallel load input 116 provides theelements in parallel to load into the buffers. Clock control logic 104is in communication with the buffers to control shifting of the elementsthrough the shift register 102. The clock control logic 104 isconfigured to issue propagation signals to those buffers that will nextstore an element included in the elements provided on the parallel loadinput 116 and that has not already been stored. The clock control logic104 also withholds propagation signals from those buffers that wouldreceive an element that those buffers have already stored and output tothe next memory in series.

FIG. 2 shows an example 200 of W element wide, N element deep, parallelto serial shift register control. In the example 200, there are W serialchannels (e.g., 202, 204, 206, 208). Each serial channel may beimplemented as a shift register 102. Thus, each serial channel includesan N element parallel load input (e.g., 210) and a serial output (e.g.,212). A common parallel load enable input 128 may be provided to eachserial channel.

Note that one instance of clock control logic 214 is present. The clockcontrol logic 214 may be implemented as the clock control logic 104shown in FIG. 1. The clock control logic 214 provides gated clock signaloutputs 216 to each instance of the shift registers that implement theserial channels. In the manner shown in FIG. 1, the gated clock signaloutputs 216 selectively provide and withhold clock signal pulses to theshift register memories in the serial channels.

When one instance of the clock control logic 214 controls W serialchannels, an approximate power saving may be determined according to:

C _(T) =N*N*W=N ² W

which represent the total clock cycles applied to memory elements inorder to propagate W sets of N element deep parallel loaded data toserial outputs (N * w memory elements, clocked N times). Then:

$C_{G} = {W*{\sum\limits_{k = 0}^{N - 1}\frac{{N\left( {n - 1} \right)}W}{2}}}$

represents the total clock cycles per parallel to serial conversion thatare saved (e.g., gated off) by the clock control logic. The clockcontrol logic 104 adds some clock cycles for clock control:

C _(A)=(N−1)+1=N

which represents the total clock cycles per parallel to serial shiftconversion applied to the clock control logic 104 and the parallel loadcircuit 106. An approximate power saving may then be expressed (in termsof clock cycles) as:

$\begin{matrix}{P_{S} = {\frac{C_{G} - C_{A}}{C_{T}} = \frac{\frac{{N\left( {N - 1} \right)}W}{2} - N}{N^{2}W}}} \\{= {\frac{1}{2} - \frac{1}{2\; N} - \frac{1}{NW}}}\end{matrix}$

As N and W grow larger, the power savings asymptotically approaches amaximum of 50%. As a specific example, when N=8 and W=12, the powersaving is approximately 43%.

FIG. 3 shows an example timing diagram 300. In this example, a 4 elementdeep (N=4) shift register converts parallel loaded data elements: {D0,D1, D2, D3}, {D4, D5, D6, D7}, and {D8, D9, D10, D11}, to serial output:{D0, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11}. The data elementsmay be data bits or any other unit of information, and they mayconstitute an aggregation (e.g., a nibble of a byte), but they need notconstitute an aggregation (e.g., the four bits may come from unrelateddata entities).

A high level view 302 of the shift register shows the first conversion,with {D0, D1, D2, D3} loaded into memories M3, M2, M1, and M0, thenconverted to serial data over four clocks cycles. This conversioncorresponds to the timing section 304 in the timing diagram 300, withthe parallel load enable signal 128 going active to start the parallelload. The gated clock signals for M3, M2, and M1 and the ungated clocksignal M0 show how the clock signals convert the parallel data toserial. In particular, the gated clock signals M3, M2, and M1 show theselective application of clock signals to those memories that have yetto receive the particular data element of the parallel loaded data thatis currently present at their input.

With respect to M1, as an example, on the first clock, M1 has not yetreceived D1, and M1 is provided with a clock pulse (306). On the nextclock, M1 has not yet received D2, and M1 is again provided with a clockpulse (308). Similarly, on the next clock M1 has not yet received D3,and M1 receives another clock pulse (310). However, on the next clock,M1 has already received element D3, and the clock control logic 104 doesnot provide a clock pulse to M1 (312). The memory M3 receives one clockpulse to accept D3, but the clock control logic 104 provides no furtherclock pulses to M3 because M3 has received every data element of theparallel loaded data that will be present at its input. In FIG. 3, the‘Enable Serial Clock FF (1), (2), and (3)’ signals show how clock gatingpropagates through the clock control logic 104, as the data itselfpropagates.

Expressed another way, the clock control logic 104 provides N-p clockpulses to a memory at position p in the serial chain during a parallelto serial conversion. The parameter ‘p’ ranges from 0 for the firstmemory in the series (e.g., MO, p=0) to N-1 for the last or Nth memoryin the series (e.g., M3, p=3).

FIG. 4 shows example logic 400 for one instance of converting parallelloaded data elements to a serial output, with power saving clockcontrol. The logic includes providing an ungated serial clock inputsignal (402) and pre-parallel loading clock enable bits into clockcontrol shift register memories (404). The clock enable bits maycorrespond to the pre-parallel load enable signal 134, and may all beasserted for the first shift operation. After the pre-parallel loadenable signal 134 goes inactive, inactive values are shifted in for theclock enable bits. Subsequent to loading the clock enable bits, thelogic 400 parallel loads N elements into the shift register 102 (406),for example using a parallel load enable input 128.

In the clock control logic 104, the clock gating logic 154 generatesgated clock signals for selected shift register memories (408). Further,the clock control logic 104 may provide a gated or an ungated clockpulse to the first memory in the shift register 102 (e.g., the memory114) (410). With each clock cycle, the logic 400 shifts the clock enablebits in the clock control logic 104 (410).

Various implementations have been specifically described. However, manyother implementations are also possible. For example, multiplexers suchas the multiplexer 152 may be replaced with combinatorial logic.Further, the logic shown above may be implemented as a standard cellstored in a design library such as a Very Large Scale Integration (VLSI)library, or implemented (e.g., for simulation purposes) as instructionsfor execution by a processor. Accordingly, the standard cell may beplaced wherever a lower power PISO shift register is desired in adigital circuit design, and may form part of a library of cellsdelivered with VLSI design software.

What is claimed is:
 1. A device comprising: a multiple-bit memorycomprising serially connected elements; and clock circuitry configuredto withhold clock signals from an increasing number of the seriallyconnected elements as a parallel-to-serial conversion operation proceedsthrough the multiple-bit memory.
 2. A device comprising: memory elementscoupled together to perform a data conversion operation on amultiple-bit input; and clock circuitry coupled to the memory elements,the clock control circuitry configured to: generate clock signals forthe memory elements for performing the data conversion operation on themulti-bit input; and determine which clock signals to generate accordingto which memory elements hold data from the multi-bit input yet to beconverted.
 3. A method comprising: generating a decreasing number ofclock signals to serially coupled memory stages as a parallel-to-serialconversion operation progresses through the serially coupled memorystages.